Skip to main content

Data and Pretreatment

The data utilized comprises all previously mentioned and replicated datasets. These are updated on a weekly basis, with no additional preprocessing applied, apart from the rebasing to 100 on row data, as outlined in the section detailing the replication of certain variables.

It was then applied a MinMaxScaler with a range of (0, 1) as a normalization used to scale the values of independent variables to lie within the interval [0, 1]. It is a preprocessing step commonly applied before training machine learning or deep learning models to improve convergence and model performance.

MinMaxScaler used transforms each independent variables values using the formula:

MinMaxScaler

I developed a fully automated Python script that consolidates all tasks required to update, import, and save files for each independent variable.

For reporting purposes, the script is executed every Wednesday, ensuring that data from the preceding Monday is available, thereby operating on a D+2 basis.